Efficient Searching Top-k Semantic Similar Words
نویسندگان
چکیده
Measuring the semantic meaning between words is an important issue because it is the basis for many applications, such as word sense disambiguation, document summarization, and so forth. Although it has been explored for several decades, most of the studies focus on improving the effectiveness of the problem, i.e., precision and recall. In this paper, we propose to address the efficiency issue, that given a collection of words, how to efficiently discover the top-k most semantic similar words to the query. This issue is very important for real applications yet the existing state-of-the-art strategies cannot satisfy users with reasonable performance. Efficient strategies on searching top-k semantic similar words are proposed. We provide an extensive comparative experimental evaluation demonstrating the advantages of the introduced strategies over the state-ofthe-art approaches.
منابع مشابه
Efficient Concept-based Document Ranking
Recently, there is increased interest in searching and computing the similarity between Electronic Medical Records (EMRs). A unique characteristic of EMRs is that they consist of ontological concepts derived from biomedical ontologies such as UMLS or SNOMEDCT. Medical researchers have found that it is effective to search and find similar EMRs using their concepts, and have proposed sophisticate...
متن کاملWikipedia-Based Semantic Interpreter Using Approximate Top-k Processing and Its Application
Proper representation of the meaning of texts is crucial for enhancing many data mining and information retrieval tasks, including clustering, computing semantic relatedness between texts, and searching. Representing of texts in the concept-space derived from Wikipedia has received growing attention recently. This concept-based representation is capable of extracting semantic relatedness betwee...
متن کاملEfficient Document Indexing Using Pivot Tree
We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bagof-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a met...
متن کاملThe Semantic and Rhetorical Function of the Synonymous and Antonymous Concepts of “Infaq” in the Holy Quran
The syntagmatic (descriptive) semantic approach is an attempt to represent the words and their relations existing in the human mind. Considering this idea, the present paper, while applying this approach, seeks to provide a descriptive analysis of the concept of infaq and to explain the semantic and rhetorical function of the concepts that having a syntagmatic relation with it are sometimes use...
متن کاملFast Algorithms for Top-k Approximate String Matching
Top-k approximate querying on string collections is an important data analysis tool for many applications, and it has been exhaustively studied. However, the scale of the problem has increased dramatically because of the prevalence of the Web. In this paper, we aim to explore the efficient top-k similar string matching problem. Several efficient strategies are introduced, such as length aware a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011